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ABSTRACT 

This study was designed to contribute to the validity 
evidence for the Washington Assessment of Student Learning (WASL) by 
providing additional descriptive data about the performance standards in 
reading and mathematics at grades 4, 7, and 10. After the realignment of 
norm-referenced tests, large numbers of students taking the WASL had 
corresponding norm-referenced scores from the previous year. It was possible 
to match samples for both sets of tests. Students' performance on the norm- 
referenced tests consistently showed mathematics performance to be slightly 
higher than reading performance at all grade levels, and performance across 
grade levels for both reading and mathematics was quite similar. Performance 
on the standards-based assessments for reading and mathematics, and across 
grade levels, exhibited marked variations, with mathematics performance 
consistently lower than corresponding grade level reading performance. 
Coefficients suggest a moderately strong relationship between performance on 
the norm-referenced tests and the standards-based assessments given a year 
later. Equipercentile equating of the distributions from both was developed. 
In addition, the percentage of students meeting the performance standard was 
plotted as a function of progressively higher national percentile rank bands. 
Data and portrayals clearly indicate inconsistencies in the difficulty of 
performance standards across grade levels and content areas. The lack of 
vertical comparability for the reading standards at grades 4, 7, and 10 
undermines a belief in their reasonableness. Even though they are more 
consistent, the overall difficulty of the mathematics standards also makes it 
hard to believe that they are reasonable. The difference between reading and 
mathematics performance at grades 4 and 10 also makes it difficult to promote 
these measures as fair. Some of the factors contributing to these problems 
are discussed. (SLD) 
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Validity Evidence for Washington Assessment of Student Learning (WASL) 
Performance Standard Cut-Scores for Reading and Mathematics 

The perfonnances of schools and districts on the Washington Assessment of Student 
Achievement (WASL), the state’s standards-based assessment, are the p rim ary 
achievement indicators for the state accountability system. In addition, these same scores 
are used as the performance indicators in the accountability system required by the No 
Child Left Behind (NCLB) federal legislation. Critical elements of such standards-based 
assessments are the performance standards, or cut-scores, that categorize the performance 
into a limited number of levels. The NCLB requires a minimum of three levels and 
labels them “basic,” “proficient,” and “advanced.” The categories of “proficient” and 
“advanced” are considered acceptable levels of achievement in these new accountability 
systems. Therefore, the validity of these classifications, and the inferences about students 
and schools that are based on them, are of great importance. 



The cut-scores are typically arrived at through a standard setting procedures based on 
judgments. Such decisions are made by panels of judges, primarily educators having 
knowledge of the curriculum standards from which the test content is derived and 
experience teaching students at the grade level being tested. In the current climate of 
high stakes accountability, any number of such performance standards for state tests are 
perceived as unreasonably difficult. Such doubts about the fairness of these performance 
standards raise questions of the validity of the interpretations about students and schools 
that are based on them. 

The WASL was phased in over three consecutive years beginning with 4'** grade in the 
spring of 1997. In the initial years these assessments were voluntary for schools and 
districts. However, at each grade level, over ninety percent of the students in the state 
participated during these voluntary years. The 4‘^ grade assessment, voluntary in the 
spring of 1997, became mandatory in the spring of 1998. The 7‘^ grade assessment was 
instituted as a voluntary assessment in the spring of 1998 and did not become mandatory 
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until the spring of 2001. The lO"* grade assessment was the last to be developed and first 
appeared as a voluntary program in the spring of 1999. This component also became 
mandatory in the spring of 2001. The performance standards (cut-scores) for these 
assessments were established during the summer immediately following their initial 
administration. 

The Washington State Assessment Program also includes three grade levels of norm- 
referenced tests. The Iowa Tests of Basic Skills (ITBS) is administered in the spring at 
3^^* and 6“* grades and the Iowa Tests of Educational Development (ITED) at 9“* grade. 
These assessments represent a hold over from the prior state assessment program and 
used to be administered in grades 4, 8, and 10. However, with the institution of the 
standards-based assessments in grades 4, 7, and 10, it was decided a better alignment 
would be to place the “basic skills” assessments in the years prior to the standards-based 
tests. These placements occurred first at the elementary level in the spring of 1999 and in 
the following year at the secondary level. 

The study reported here was designed to contribute to the validity evidence for the 
WASL by providing additional descriptive data about the performance standards in 
reading and mathematics at 4“', 7‘^ and 10“' grades. After the realignment of the norm- 
referenced tests large numbers of students taking the WASL had corresponding norm- 
references test scores from the previous year. The first such cohort with both the prior 
year’s norm-referenced test scores and the corresponding standards-based scores 
occurred in the spring of 2000 at 4“* grade. In the subsequent spring of 2001 such cohorts 
first occurred at 7“' and 10“' grades. Table 1 shows the percent of students meeting the 
state performance standard in grades 4, 7 and 10 for reading and mathematics for all 
students and for the matched sets of students having norm-referenced test scores from the 
prior year for each of these cohorts. These matched samples included only students 
having valid scores for reading and mathematics on both the standards-based assessments 
and the norm-referenced tests. Table 2 shows the ITBS or ITED National Percentile 
Rank (NPR) equivalent of the mean scale scores in grades 3, 6, and 9 for all students and 
the corresponding matched samples represented in Table 1. 
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Students’ performance on the norm-referenced tests (Table 2) consistently shows 
mathematics performance to be slightly higher than reading performance at all grade 
levels. In addition, the performance across grade levels for both reading and mathematics 
was quite similar. Performance on the standards-based assessments (Table 1) for reading 
and mathematics, and across grade levels, exhibited marked variations. Performance on 
the mathematics assessments is uniformly lower than the corresponding grade level 
reading performance. Math performance is highest at the elementary level and lowest at 
the middle level. Reading performance is much higher than math at grades 4 and 10. 
Although the T'*" grade reading performance is still higher than math at that level, it is 
markedly below that for reading at grades 4 and 10. These patterns raise concerns about 
the reasonableness of the performance standards for the standards-based assessments, 
particularly give the corresponding stability in the norm-referenced test data. 

Table 4 shows the correlation coefficients for the norm-referenced and standards-based 
reading and math pairs for the three grade levels for the different matched samples. 

These correlation coefficients remained quite consistent across years with the exception 
of that for reading between the ITBS reading at 3^^* grade in 2001 and the WASL reading 
at 4‘*' grade in 2002. These coefficients suggest a moderately strong relationship between 
the performance on the norm-referenced tests and the standards-based assessments given 
a year later. Based on the size of these coefficients, two additional analyses were 
conducted. 

First, equipercentile equating of the standards-based assessment distributions and the 
corresponding norm-referenced distributions were developed. Table 5 gives the 
estimated NPR for the equivalent standards-based assessment cut-score at the 
performance standard. When expressed as NPRs it is clear that the reading cut-scores at 
the standard in 4*'' and 10*'' grades are at the lower end of what would be considered the 
normal or average range of traditional norm-referenced test performance. The math cut- 
score at 4‘*' grade appears to be at the upper end of the normal range as does that for 7*'' 
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grade reading. The math cut-scores for the standards-based assessment at both V'** and 
lO'** grade appear to lie slightly above the normal range. 

The second approach to illuminating the relationship between students’ prior year norm- 
referenced test performances and their subsequent standards-based performance involved 
plotting the percentage of students meeting the performance standard as a function of 
progressively higher NPR bands ranging from “1-4” to “95-99.” Figures 1 through 6 
display these relationships. Figures 1 and 2 each show the relationships between the 
grade reading (Figure 1) and math (Figure 2) norm-referenced performance and the 
corresponding standards-based performance at 4‘*' grade. Figure 1 shows that for reading 
the relationship remained very stable across three consecutive years. Figure 2, for 4‘*' 
grade math, however shows that the first two years remained almost identical, however 
for 2002 the percent of students meeting the performance standard was systematically 
higher for each band except for the two extreme bands. In addition. Figure 2 shows that 
the percent of students meeting the standard is below 50% until the “60-65 NPR” band is 
reached. By comparison, for 4'** grade reading at the “60-65 NPR” band, over 80% of the 
students met the standard. 

Both Figures 3 and 4 (T^** grade reading and math respectively) show slight increases in 
the percent of students meeting the standard across almost all bands for 2002 compared to 
2001. However, the percent of students meeting the standard remains low for both 
reading and math until the higher bands of the NPR distribution are reached. This is 
particularly pronounced for grade mathematics. 

Finally, Figures 5 and 6 show the relationships between the norm-referenced scores and 
the standards-based scores for reading and math respectively at lO'** grade. The reading 
function looks very similar to that at 4'** grade except there was more growth between 
2001 and 2002 than was shown at 4“* grade. Math on the other hand shows no growth, 
actually a slight decline in performance, from 2001 to 2002. In addition, the math 
function looks much more like that for math at seventh grade except at the higher NPR 
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bands where slightly larger percents of students met the standard at 10‘‘' grade than did 
for the corresponding bands at 7^** grade. 

These data and portrayals clearly indicate inconsistencies in the difficulty of the 
performance standards across grade levels and content areas. The lack of vertical 
comparability for the reading standards at grades 4, 7, and 10 undermines a belief in their 
reasonableness. Even though they are more consistent across grade levels, the overall 
difficulty of the mathematics standards also makes it harder to believe that they are 
reasonable. The large difference between the reading and math performance at grades 4 
and 10 also makes it difficult to promote these accountability measures as fair. 

The performance standards for the WASL assessments were set by difference standard 
setting committees meeting during the summer in three different years (1997, 1998, and 
1999). Furthermore, the standard setters were not allowed to have access to impact data 
during their review process. And finally, the policy board responsible for establishing the 
performance standards choice to not intervene and moderate the committee 
recommendations. These factors no doubt contributed in significant ways to produce the 
results describe in this paper. Much more attention must be paid to the role of policy 
bodies in the setting of performance standards for these new accountability systems. The 
work of the judges during the standard setting sessions must be treated as only one source 
of information about the desired standards. Policy makers must be much better informed 
about their role in exercising the final judgments about these very important decisions. 
They must provide the needed moderation required to arrive at performance standards 
that are perceive as reasonable while at the same time encouraging practitioners to strive 
for even greater learning for their students. 
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Washington Data: Norm-Reference & Standards-Based Tests* 



Table 1. Standards-Based Tests (SBT) - Percent Met Standard 



Year Statewide 

Introduced 2000 2001 2002 


Matched Sample 

2000 N 2001 N 2002 N 


1997 4th Reading 65.8 66.1 65.6 

Math 41.8 43.4 51.8 


71 3 

53,092 

46.6 


71.0 „ 

48.1 


57 571 
56.3 ’ 


1998 7th Reading 41.5 39.8 44.5 

Math 28.2 27.4 30.4 


NA 

IVTA NA 

NA 


44 5 
31.4 


343 


1999 10th Reading 59.8 62.4 59.2 

Math 35.0 38.9 37.3 


NA 

IVTA NA 

NA 


72.4 

53 372 
47.2 ’ 





Table 2 . Norm-referenced Tests (NRT) - NPR Equivalent of Mean Scale Score 



Year 

Introduced 






1999 


Statewide 

2000 


2001 


Matched Sample 
1999 2000 2001 


1999 


3th 


Reading 


55 


56 


57 


53 


55 


55 






Math 


60 


63 


64 


58 


59 


61 


2000 


6th 


Reading 


NA 


54 


53 


NA 


55 


55 






Math 


NA 


56 


56 


NA 


57 


56 


2000 


9th 


Reading 


NA 


54 


53 


NA 


59 


58 






Math 


NA 


60 


59 


NA 


65 


64 



Table 3. Means and Standard Deviations for NRT and SBT Scale Scores - Matched Samples 











Mean 






SD 










99/00 


00/01 


01/02 


99/00 


00/01 


01/02 




NRT 


Reading 


187.4 


188.1 


188.3 


19.8 


19.5 


19.3 


4th 


Math 


188.7 


190.2 


190.6 


18.4 


18.5 


18.4 


SBT 


Reading 


409.3 


407.6 


409.1 


18.9 


17.9 


19.5 




Math 


394.9 


397.0 


403.8 


33.7 


33.9 


33.0 




NRT 


Reading 


NA 


230.6 


230.5 


NA 


27.5 


27.7 


7th 


Math 


NA 


232.8 


232.4 


NA 


27.6 


27.7 


SBT 


Reading 


NA 


396.7 


397.2 


NA 


19.7 


19.1 




Math 


NA 


374.4 


379.3 


NA 


50.3 


47.2 




NRT 


Reading 


NA 


268.0 


267.0 


NA 


34.3 


34.3 


10th 


Math 


NA 


278.1 


276.7 


NA 


35.9 


36.1 


SBT 


Reading 


NA 


413.7 


411.3 


NA 


28.8 


30.0 




Math 


NA 


395.8 


393.3 


NA 


40.1 


37.2 



Table 4. Correlation Coefficients: Prior Table 5. Equipercentile Equating: Estimated 

Year's NRT and Standards-Based Tests NPR Equivalents of the SBT Cut Scores 







2000 


2001 


2002 


4th 


Reading 


,72 


.72 


.66 


Math 


.77 


.77 


.74 


7th 


Reading 


NA 


.76 


.75 


Math 


NA 


.83 


.83 


10th 


Reading 


NA 


.74 


.74 


Math 


NA 


.80 


.80 







2000 


2001 


2002 


4th 


Reading 


38th 


40th 


38th 


Math 


61st 


61st 


53rd 


7th 


Reading 


NA 


63rd 


56th 


Math 


NA 


72nd 


69th 


10th 


Reading 


NA 


43rd. 


45th 


Math 


NA 


72nd 


72nd 



*NRT: 3rd & 6th - ITBS; 9th - ITED SBT: 4th, 7th & 10th - Washington Assessment of Student Learning 
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